AITopics | quadratic model

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Neural Information Processing SystemsDec-24-2025, 02:26:33 GMT

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of function values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters.

large-scale, name change, stochastic momentum method, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

Jeffrey Regier, Michael I. Jordan, Jon McAuliffe

Neural Information Processing SystemsNov-21-2025, 11:57:53 GMT

The former is based on stochastic first-order optimization.

artificial intelligence, iteration, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Transportation > Air (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)

Add feedback

Low-dimensional models of neural population activity in sensory cortical circuits

Evan W. Archer, Urs Koster, Jonathan W. Pillow, Jakob H. Macke

Neural Information Processing SystemsOct-2-2025, 21:56:57 GMT

Neural responses in visual cortex are influenced by visual stimuli and by ongoing spiking activity in local circuits. An important challenge in computational neuroscience is to develop models that can account for both of these features in large multi-neuron recordings and to reveal how stimulus representations interact with and depend on cortical dynamics. Here we introduce a statistical model of neural population activity that integrates a nonlinear receptive field model with a latent dynamical model of ongoing cortical activity. This model captures temporal dynamics and correlations due to shared stimulus drive as well as common noise. Moreover, because the nonlinear stimulus inputs are mixed by the ongoing dynamics, the model can account for a multiple idiosyncratic receptive field shapes with a small number of nonlinear inputs to a low-dimensional dynamical model. We introduce a fast estimation method using online expectation maximization with Laplace approximations, for which inference scales linearly in both population size and recording duration. We test this model to multi-channel recordings from primary visual cortex and show that it accounts for neural tuning properties as well as cross-neural correlations.

correlation, neuron, receptive field, (15 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > California (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Model-Based Relative Entropy Stochastic Search

Abbas Abdolmaleki, Rudolf Lioutikov, Jan R. Peters, Nuno Lau, Luis Pualo Reis, Gerhard Neumann

Neural Information Processing SystemsOct-2-2025, 04:01:04 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, objective function, search distribution, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Europe > Portugal > Aveiro > Aveiro (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Enhancing Trust-Region Bayesian Optimization via Newton Methods

Chen, Quanlin, Chen, Yiyu, Huo, Jing, Ding, Tianyu, Gao, Yang, Chen, Yuetong

arXiv.org Machine LearningAug-27-2025

Bayesian Optimization (BO) has been widely applied to optimize expensive black-box functions while retaining sample efficiency. However, scaling BO to high-dimensional spaces remains challenging. Existing literature proposes performing standard BO in multiple local trust regions (TuRBO) for heterogeneous modeling of the objective function and avoiding over-exploration. Despite its advantages, using local Gaussian Processes (GPs) reduces sampling efficiency compared to a global GP . To enhance sampling efficiency while preserving heterogeneous modeling, we propose to construct multiple local quadratic models using gradients and Hessians from a global GP, and select new sample points by solving the bound-constrained quadratic program. Additionally, we address the issue of vanishing gradients of GPs in high-dimensional spaces. We provide a convergence analysis and demonstrate through experimental results that our method enhances the efficacy of TuRBO and outperforms a wide range of high-dimensional BO techniques on synthetic functions and real-world applications.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Machine Learning

2508.18423

Country:

Europe > Austria > Vienna (0.14)
Europe > Spain > Canary Islands (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(15 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)

Add feedback

Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression

Ding, Shihong, Zhang, Haihan, Zhao, Hanzhen, Fang, Cong

arXiv.org Artificial IntelligenceFeb-13-2025

In machine learning, the scaling law describes how the model performance improves with the model and data size scaling up. From a learning theory perspective, this class of results establishes upper and lower generalization bounds for a specific learning algorithm. Here, the exact algorithm running using a specific model parameterization often offers a crucial implicit regularization effect, leading to good generalization. To characterize the scaling law, previous theoretical studies mainly focus on linear models, whereas, feature learning, a notable process that contributes to the remarkable empirical success of neural networks, is regretfully vacant. This paper studies the scaling law over a linear regression with the model being quadratically parameterized. We consider infinitely dimensional data and slope ground truth, both signals exhibiting certain power-law decay rates. We study convergence rates for Stochastic Gradient Descent and demonstrate the learning rates for variables will automatically adapt to the ground truth. As a result, in the canonical linear regression, we provide explicit separations for generalization curves between SGD with and without feature learning, and the information-theoretical lower bound that is agnostic to parametrization method and the algorithm. Our analysis for decaying ground truth provides a new characterization for the learning dynamic of the model.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Artificial Intelligence

2502.09106

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.81)

Add feedback

Low-dimensional models of neural population activity in sensory cortical circuits

Evan W. Archer, Urs Koster, Jonathan W. Pillow, Jakob H. Macke

Neural Information Processing SystemsFeb-9-2025, 20:04:06 GMT

Neural responses in visual cortex are influenced by visual stimuli and by ongoing spiking activity in local circuits. An important challenge in computational neuroscience is to develop models that can account for both of these features in large multi-neuron recordings and to reveal how stimulus representations interact with and depend on cortical dynamics. Here we introduce a statistical model of neural population activity that integrates a nonlinear receptive field model with a latent dynamical model of ongoing cortical activity. This model captures temporal dynamics and correlations due to shared stimulus drive as well as common noise. Moreover, because the nonlinear stimulus inputs are mixed by the ongoing dynamics, the model can account for a multiple idiosyncratic receptive field shapes with a small number of nonlinear inputs to a low-dimensional dynamical model. We introduce a fast estimation method using online expectation maximization with Laplace approximations, for which inference scales linearly in both population size and recording duration. We test this model to multi-channel recordings from primary visual cortex and show that it accounts for neural tuning properties as well as cross-neural correlations.

artificial intelligence, correlation, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > California (0.04)
North America > Canada > Ontario > Toronto (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-6-2025, 19:17:56 GMT

Summary: The paper introduces an algorithm (MORE) for black box optimization that constructs local quadratic surrogate'' models based on recent observations. By using a (simpler than the true function) quadratic model, the iterative step of refining the reward parameters can be computed in closed form (though the models themselves seem to be built with a form of sampling). This approach allows for a more sample efficient and robust search procedure, which is shown to outperform state-of-the-art methods in terms of samples and converged parameters on a number of simple functions as well as some very complex robotics tasks. Review: The new algorithm performs much better than the state-of-the-art algorithms in a wide range of experiments and is applicable in a very important problem setting. I appreciate the wide range of problems that the authors used and I think they make a very strong case for the new algorithm.

algorithm, author feedback and meta-review, exact solution, (11 more...)

Neural Information Processing Systems

Genre: Research Report (0.72)

Technology: Information Technology > Artificial Intelligence > Robots (0.38)

Add feedback

Data-driven system identification using quadratic embeddings of nonlinear dynamics

Klus, Stefan, N'Konzi, Joel-Pascal

arXiv.org Machine LearningJan-14-2025

We propose a novel data-driven method called QENDy (Quadratic Embedding of Nonlinear Dynamics) that not only allows us to learn quadratic representations of highly nonlinear dynamical systems, but also to identify the governing equations. The approach is based on an embedding of the system into a higher-dimensional feature space in which the dynamics become quadratic. Just like SINDy (Sparse Identification of Nonlinear Dynamics), our method requires trajectory data, time derivatives for the training data points, which can also be estimated using finite difference approximations, and a set of preselected basis functions, called dictionary. We illustrate the efficacy and accuracy of QENDy with the aid of various benchmark problems and compare its performance with SINDy and a deep learning method for identifying quadratic embeddings. Furthermore, we analyze the convergence of QENDy and SINDy in the infinite data limit, highlight their similarities and main differences, and compare the quadratic embedding with linearization techniques based on the Koopman operator.

approximation, basis function, equation, (15 more...)

arXiv.org Machine Learning

2501.08202

Country:

North America > United States > New York (0.04)
Europe > Germany > Hesse > Darmstadt Region > Wiesbaden (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Neural Information Processing SystemsOct-10-2024, 07:12:12 GMT

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of function values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum.

large-scale, quadratic model, stochastic momentum method, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Filters

Collaborating Authors

quadratic model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

Low-dimensional models of neural population activity in sensory cortical circuits

Model-Based Relative Entropy Stochastic Search

Enhancing Trust-Region Bayesian Optimization via Newton Methods

Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression

Low-dimensional models of neural population activity in sensory cortical circuits

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Data-driven system identification using quadratic embeddings of nonlinear dynamics

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models